Closed
Conversation
Rust makes the current file available as a statically-allocated string, but without a NUL terminator. Allow this by storing an optional maximum length in the Error. Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The function name is not available in Rust, so make it optional. Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Provide an implementation of std::error::Error that bridges the Rust
anyhow::Error and std::panic::Location types with QEMU's Error*.
It also has several utility methods, analogous to error_propagate(),
that convert a Result into a return value + Error** pair. One important
difference is that these propagation methods *panic* if *errp is NULL,
unlike error_propagate() which eats subsequent errors[1]. The reason
for this is that in C you have an error_set*() call at the site where
the error is created, and calls to error_propagate() are relatively rare.
In Rust instead, even though these functions do "propagate" a
qemu_api::Error into a C Error**, there is no error_setg() anywhere that
could check for non-NULL errp and call abort(). error_propagate()'s
behavior of ignoring subsequent errors is generally considered weird,
and there would be a bigger risk of triggering it from Rust code.
[1] This is actually a violation of the preconditions of error_propagate(),
so it should not happen. But you never know...
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the need to convert after every read of the BqlCell. Because the vmstate uses a u8 as the size of the VARRAY, this requires switching the VARRAY to use num_timers_save; which in turn requires ensuring that the num_timers_save is always there. For simplicity do this by removing support for version 1, which QEMU has not been producing for ~15 years. Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No functional change intended. Suggested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not silently adjust num_timers, and fail if intcap is 0. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Match the code in hpet.c; this also allows removing the BqlCell from the num_timers field. Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that the num_timers field is initialized as a property, someone may change its default value using qdev_prop_set_uint8(), but the value is fixed after the Rust code sees it first. Since there is no need to modify it after realize(), it is not to be necessary to have a BqlCell wrapper. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250520152750.2542612-4-zhao1.liu@intel.com [Remove .into() as well. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
error is new; offset_of is gone. Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the enum includes values such as "Ok", "Err", or "Error", the TryInto macro can cause errors. Be careful and qualify identifiers with the full path, or in the case of TryFrom<>::Error do not use the associated type at all. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A page state change is typically followed by an access of the page(s) and results in another VMEXIT in order to map the page into the nested page table. Depending on the size of page state change request, this can generate a number of additional VMEXITs. For example, under SNP, when Linux is utilizing lazy memory acceptance, memory is typically accepted in 4M chunks. A page state change request is submitted to mark the pages as private, followed by validation of the memory. Since the guest_memfd currently only supports 4K pages, each page validation will result in VMEXIT to map the page, resulting in 1024 additional exits. When performing a page state change, invoke KVM_PRE_FAULT_MEMORY for the size of the page state change in order to pre-map the pages and avoid the additional VMEXITs. This helps speed up boot times. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/f5411c42340bd2f5c14972551edb4e959995e42b.1743193824.git.thomas.lendacky@amd.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
futex(2) - Linux manual page https://man7.org/linux/man-pages/man2/futex.2.html > Note that a wake-up can also be caused by common futex usage patterns > in unrelated code that happened to have previously used the futex > word's memory location (e.g., typical futex-based implementations of > Pthreads mutexes can cause this under some conditions). Therefore, > callers should always conservatively assume that a return value of 0 > can mean a spurious wake-up, and use the futex word's value (i.e., > the user-space synchronization scheme) to decide whether to continue > to block or not. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Link: https://lore.kernel.org/r/20250529-event-v5-1-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Windows supports futex-like APIs since Windows 8 and Windows Server 2012. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Link: https://lore.kernel.org/r/20250529-event-v5-2-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
scripts/checkpatch.pl warns for __linux__ saying "architecture specific defines should be avoided". Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250526-event-v4-4-5b784cc8e1de@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu-thread used to abstract pthread primitives into futex for the QemuEvent implementation of POSIX systems other than Linux. However, this abstraction has one key difference: unlike futex, pthread primitives require an explicit destruction, and it must be ordered after wait and wake operations. It would be easier to perform destruction if a wait operation ensures the corresponding wake operation finishes as POSIX semaphore does, but that requires to protect state accesses in qemu_event_set() and qemu_event_wait() with a mutex. On the other hand, real futex does not need such a protection but needs complex barrier and atomic operations to ensure ordering between the two functions. Add special implementations of qemu_event_set() and qemu_event_wait() using pthread primitives. qemu_event_wait() will ensure qemu_event_set() finishes, and these functions will avoid complex barrier and atomic operations to ensure ordering between them. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Tested-by: Phil Dennis-Jordan <phil@philjordan.eu> Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu> Link: https://lore.kernel.org/r/20250526-event-v4-5-5b784cc8e1de@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the futex-based implementation of QemuEvent on Windows to remove code duplication and remove the overhead of event object construction and destruction. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250526-event-v4-6-5b784cc8e1de@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This unlocks the futex-based implementation of QemuLockCnt to Windows. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Link: https://lore.kernel.org/r/20250529-event-v5-6-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Document QemuEvent to help choose an appropriate synchronization primitive. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Link: https://lore.kernel.org/r/20250529-event-v5-12-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
pause_event can utilize qemu_event_reset() to discard events. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250529-event-v5-7-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
colo_exit_sem and colo_incoming_sem represent one-shot events so they can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250529-event-v5-8-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
thread_sync_sem is an one-shot event so it can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250529-event-v5-9-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
sem in AppleGFXReadMemoryJob is an one-shot event so it can be converted into QemuEvent, which is more specialized for such a use case. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250529-event-v5-10-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Intel SDM section 10.2.3.3 on the MXCSR.FTZ bit says that we flush outputs to zero when we detect underflow, which is after rounding. Set the detect_ftz flag accordingly. This allows us to enable the test in fma.c which checks this behaviour. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250519145114.2786534-2-peter.maydell@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The softfloat get_float_exception_flags() function returns 'int', but in various places in target/i386 we incorrectly store the returned value into a uint8_t. This currently has no ill effects because i386 doesn't care about any of the float_flag enum values above 0x40. However, we want to start using float_flag_input_denormal_used, which is 0x4000. Switch to using 'int' so that we can handle all the possible valid float_flag_* values. This includes changing the return type of save_exception_flags() and the argument to merge_exception_flags(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250519145114.2786534-3-peter.maydell@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The x86 DE bit in the FPU and MXCSR status is supposed to be set when an input denormal is consumed. We didn't previously report this from softfloat, so the x86 code either simply didn't set the DE bit or else incorrectly wired it up to denormal_flushed, depending on which register you looked at. Now we have input_denormal_used we can wire up these DE bits with the semantics they are supposed to have. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Link: https://lore.kernel.org/r/20250519145114.2786534-4-peter.maydell@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add some fma test cases that check for correct handling of FTZ and for the flag that indicates that the input denormal was consumed. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250519145114.2786534-5-peter.maydell@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
… staging Python Pull Request Add QAPI and QAPI doc files to python static analysis testing regime, this time for real, probably # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAmhB388ACgkQfe+BBqr8 # OQ6lMA//WJtSr57ADW5k5zcRMxV7k//erYFkjgXbTh7b9DDblMwNVhYr5lqJbEvS # V5OChW32++QIO5Y4cBhzbzxFTJXbAYzyg3UATCkH2kRbd139bqdAtsnsaFmoHmLP # c8KAggT1+hIb7JIVkFiFccMsdCeFwXwQoS5Nk7w95H9cxxYUj/O9qbRuCN+elg/e # mX4zaq6F2umTx0EdD35DlBPrPPyRsdlVWKUqh8f5KaAGPOelGyvbgwrXU2MT7ewG # JXcRoYzn/9J2KSboiFY0MjIKqDuhoMdCnbSNpRNGgClJRa+VZEBPFClMe1YSXw0m # J3kQMYeqm5S1GUG+ZrBTICY6Ch8jNq2kb3ua707JJWdYmd9gq0poF/P7gaRVbyAL # 5UdYVVgtH/3xve2LGe0guj3v5kTK7Vo6dApwj8pRHrBWWOgAG0UgGseOJgndfCIx # PQRsF2T4YoVdjiGB46EIgBmoFI+VJGwFRlvb6WZ0YmPedi7MuUvWmo0lbgDkaTO+ # MMqsWxShTY+xwnSFgtl1iHOAdfT6jiHcn1n+hZrGpvF492XRjW02zKiDSZECqSz5 # lg51+OaDc2HwS65sYyFb4GD7yF/PcdOj7MG/Ij9dx0GoM9/HmcVAHyRt45QNgxzc # N7Xx6GFGs7puDoE/pSoauFtGC8XeR6Cx0HfBcXYGaJcJEq6N4yw= # =IVAr # -----END PGP SIGNATURE----- # gpg: Signature made Thu 05 Jun 2025 14:19:59 EDT # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * tag 'python-pull-request' of https://gitlab.com/jsnow/qemu: qapi: delete un-needed python static analysis configs python: Drop redundant warn_unused_configs = True python: add qapi static analysis tests python: update missing dependencies from minreqs docs/qapidoc: linting fixes qapi: Add some pylint ignores Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* futex: support Windows * qemu-thread: Avoid futex abstraction for non-Linux * migration, hw/display/apple-gfx: replace QemuSemaphore with QemuEvent * rust: bindings for Error * hpet, rust/hpet: return errors from realize if properties are incorrect * rust/hpet: Drop BqlCell wrapper for num_timers * target/i386: Emulate ftz and denormal flag bits correctly * i386/kvm: Prefault memory on page state change # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhC4AgUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroP09wf+K9e0TaaZRxTsw7WU9pXsDoYPzTLd # F5CkBZPY770X1JW75f8Xw5qKczI0t6s26eFK1NUZxYiDVWzW/lZT6hreCUQSwzoS # b0wlAgPW+bV5dKlKI2wvnadrgDvroj4p560TS+bmRftiu2P0ugkHHtIJNIQ+byUQ # sWdhKlUqdOXakMrC4H4wDyIgRbK4CLsRMbnBHBUENwNJYJm39bwlicybbagpUxzt # w4mgjbMab0jbAd2hVq8n+A+1sKjrroqOtrhQLzEuMZ0VAwocwuP2Adm6gBu9kdHV # tpa8RLopninax3pWVUHnypHX780jkZ8E7zk9ohaaK36NnWTF4W/Z41EOLw== # =Vs6V # -----END PGP SIGNATURE----- # gpg: Signature made Fri 06 Jun 2025 08:33:12 EDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (31 commits) tests/tcg/x86_64/fma: add test for exact-denormal output target/i386: Wire up MXCSR.DE and FPUS.DE correctly target/i386: Use correct type for get_float_exception_flags() values target/i386: Detect flush-to-zero after rounding hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent migration/postcopy: Replace QemuSemaphore with QemuEvent migration/colo: Replace QemuSemaphore with QemuEvent migration: Replace QemuSemaphore with QemuEvent qemu-thread: Document QemuEvent qemu-thread: Use futex if available for QemuLockCnt qemu-thread: Use futex for QemuEvent on Windows qemu-thread: Avoid futex abstraction for non-Linux qemu-thread: Replace __linux__ with CONFIG_LINUX futex: Support Windows futex: Check value after qemu_futex_wait() i386/kvm: Prefault memory on page state change rust: make TryFrom macro more resilient docs: update Rust module status rust/hpet: Drop BqlCell wrapper for num_timers rust/hpet: return errors from realize if properties are incorrect ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
…rd() Update ReplayRamDiscard() function to return the result and unify the ReplayRamPopulate() and ReplayRamDiscard() to ReplayRamDiscardState() at the same time due to their identical definitions. This unification simplifies related structures, such as VirtIOMEMReplayData, which makes it cleaner. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250612082747.51539-4-chenyi.qiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>
… with guest_memfd Commit 852f004 ("RAMBlock: make guest_memfd require uncoordinated discard") highlighted that subsystems like VFIO may disable RAM block discard. However, guest_memfd relies on discard operations for page conversion between private and shared memory, potentially leading to the stale IOMMU mapping issue when assigning hardware devices to confidential VMs via shared memory. To address this and allow shared device assignement, it is crucial to ensure the VFIO system refreshes its IOMMU mappings. RamDiscardManager is an existing interface (used by virtio-mem) to adjust VFIO mappings in relation to VM page assignment. Effectively page conversion is similar to hot-removing a page in one mode and adding it back in the other. Therefore, similar actions are required for page conversion events. Introduce the RamDiscardManager to guest_memfd to facilitate this process. Since guest_memfd is not an object, it cannot directly implement the RamDiscardManager interface. Implementing it in HostMemoryBackend is not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks have a memory backend while others do not. Notably, virtual BIOS RAMBlocks using memory_region_init_ram_guest_memfd() do not have a backend. To manage RAMBlocks with guest_memfd, define a new object named RamBlockAttributes to implement the RamDiscardManager interface. This object can store the guest_memfd information such as the bitmap for shared memory and the registered listeners for event notifications. A new state_change() helper function is provided to notify listeners, such as VFIO, allowing VFIO to do dynamically DMA map and unmap for the shared memory according to conversion events. Note that in the current context of RamDiscardManager for guest_memfd, the shared state is analogous to being populated, while the private state can be considered discarded for simplicity. In the future, it would be more complicated if considering more states like private/shared/discarded at the same time. In current implementation, memory state tracking is performed at the host page size granularity, as the minimum conversion size can be one page per request. Additionally, VFIO expected the DMA mapping for a specific IOVA to be mapped and unmapped with the same granularity. Confidential VMs may perform partial conversions, such as conversions on small regions within a larger one. To prevent such invalid cases and until support for DMA mapping cut operations is available, all operations are performed with 4K granularity. In addition, memory conversion failures cause QEMU to quit rather than resuming the guest or retrying the operation at present. It would be future work to add more error handling or rollback mechanisms once conversion failures are allowed. For example, in-place conversion of guest_memfd could retry the unmap operation during the conversion from shared to private. For now, keep the complex error handling out of the picture as it is not required. Tested-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250612082747.51539-5-chenyi.qiang@intel.com [peterx: squash fixup from Chenyi to fix builds] Signed-off-by: Peter Xu <peterx@redhat.com>
A new field, attributes, was introduced in RAMBlock to link to a RamBlockAttributes object, which centralizes all guest_memfd related information (such as fd and status bitmap) within a RAMBlock. Create and initialize the RamBlockAttributes object upon ram_block_add(). Meanwhile, register the object in the target RAMBlock's MemoryRegion. After that, guest_memfd-backed RAMBlock is associated with the RamDiscardManager interface, and the users can execute RamDiscardManager specific handling. For example, VFIO will register the RamDiscardListener and get notifications when the state_change() helper invokes. As coordinate discarding of RAM with guest_memfd is now supported, only block uncoordinated discard. Tested-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250612082747.51539-6-chenyi.qiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>
Currently we have a short paragraph saying that patches must include a Signed-off-by line, and merely link to the kernel documentation. The linked kernel docs have a lot of content beyond the part about sign-off an thus are misleading/distracting to QEMU contributors. This introduces a dedicated 'code-provenance' page in QEMU talking about why we require sign-off, explaining the other tags we commonly use, and what to do in some edge cases. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Files contributed to QEMU are generally expected to be provided in the preferred format for manipulation. IOW, we generally don't expect to have generated / compiled code included in the tree, rather, we expect to run the code generator / compiler as part of the build process. There are some obvious exceptions to this seen in our existing tree, the biggest one being the inclusion of many binary firmware ROMs. A more niche example is the inclusion of a generated eBPF program. Or the CI dockerfiles which are mostly auto-generated. In these cases, however, the preferred format source code is still required to be included, alongside the generated output. Tools which perform user defined algorithmic transformations on code are not considered to be "code generators". ie, we permit use of coccinelle, spell checkers, and sed/awk/etc to manipulate code. Such use of automated manipulation should still be declared in the commit message. One off generators which create a boilerplate file which the author then fills in, are acceptable if their output has clear copyright and license status. This could be where a contributor writes a throwaway python script to automate creation of some mundane piece of code for example. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
There has been an explosion of interest in so called AI code generators. Thus far though, this is has not been matched by a broadly accepted legal interpretation of the licensing implications for code generator outputs. While the vendors may claim there is no problem and a free choice of license is possible, they have an inherent conflict of interest in promoting this interpretation. More broadly there is, as yet, no broad consensus on the licensing implications of code generators trained on inputs under a wide variety of licenses The DCO requires contributors to assert they have the right to contribute under the designated project license. Given the lack of consensus on the licensing of AI code generator output, it is not considered credible to assert compliance with the DCO clause (b) or (c) where a patch includes such generated code. This patch thus defines a policy that the QEMU project will currently not accept contributions where use of AI code generators is either known, or suspected. These are early days of AI-assisted software development. The legal questions will be resolved eventually. The tools will mature, and we can expect some to become safely usable in free software projects. The policy we set now must be for today, and be open to revision. It's best to start strict and safe, then relax. Meanwhile requests for exceptions can also be considered on a case by case basis. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
…rx/qemu into staging Migration / Memory pull - Yanfei's optimization to skip log_clear during completion - Fabiano's cleanup to remove leftover migration-helpers.c file - Juraj's vnc fix on display pause after migration - Jaehoon's cpr test fix on possible race of server establishment - Chenyi's initial support on vfio enablement for guest-memfd # -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCaFmzWhIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wbWYQD/dz08tyaL2J4EHESfBsW4Z1rEggVOM0cB # hlXnvzf/Pb4A/0X3Hn18bOxfPAZOr8NggS5AKgzCCYVeQEWQA2Jj8hwC # =kcTN # -----END PGP SIGNATURE----- # gpg: Signature made Mon 23 Jun 2025 16:04:42 EDT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [full] # gpg: aka "Peter Xu <peterx@redhat.com>" [full] # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706 * tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu: physmem: Support coordinated discarding of RAM with guest_memfd ram-block-attributes: Introduce RamBlockAttributes to manage RAMBlock with guest_memfd memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() memory: Change memory_region_set_ram_discard_manager() to return the result memory: Export a helper to get intersection of a MemoryRegionSection with a given range migration: Don't sync volatile memory after migration completes tests/migration: Setup pre-listened cpr.sock to remove race-condition. migration: Support fd-based socket address in cpr_transfer_input ui/vnc: Update display update interval when VM state changes to RUNNING tests/qtest: Remove migration-helpers.c migration/ram: avoid to do log clear in the last round Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
… staging linux-user: fix resource leaks in gen-vdso tcg: Add ptr+ofs alternatives to some gvec functions # -----BEGIN PGP SIGNATURE----- # # iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmhZ/LMdHHJpY2hhcmQu # aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV8aCggAtZOamQ0+EMe09u9d # slaeZDlmxHYfb4RXJQasIBi/uHoWY1bFCEWqLnjU41cpNqI7B3yihbS/YQzyI1i/ # fqjATmuhDzer7rZfdtmRdiLi6kY9SuN9tcSVMVU/kxixByPxdYspQBO8hAAQMM1X # ZY5MIR/5nEMN/U0QUMuqd3krsxzglGQl9Dn610ddVGfzluSCKLLMS/m92gaJmz0u # xoLTM29lfdtIA29JPpVY+1X8NJ/vTUeBvy2eXUGHjT11rHsYUzMVGCGbzCLluEzN # V3L/aSkiwrV+wW5M7R6+hySQl65ZVRV+E9BHuln9aDnG4jdzT3conohg2cY9a5jw # m3HqnQ== # =U6ub # -----END PGP SIGNATURE----- # gpg: Signature made Mon 23 Jun 2025 21:17:39 EDT # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * tag 'pull-tcg-20250623' of https://gitlab.com/rth7680/qemu: linux-user: fix resource leaks in gen-vdso linux-user/aarch64: Update hwcap bits from 6.14 tcg: Split out tcg_gen_gvec_dup_imm_var tcg: Split out tcg_gen_gvec_{add,sub}_var tcg: Split out tcg_gen_gvec_mov_var tcg: Split out tcg_gen_gvec_3_var tcg: Split out tcg_gen_gvec_2_var tcg: Add base arguments to check_overlap_[234] tcg: Add dbase argument to expand_clr tcg: Add dbase argument to do_dup tcg: Add dbase argument to do_dup_store Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Introduce basic plumbing for vfio-user with CONFIG_VFIO_USER. We introduce VFIOUserContainer in hw/vfio-user/container.c, which is a container type for the "IOMMU" type "vfio-iommu-user", and share some common container code from hw/vfio/container.c. Add hw/vfio-user/pci.c for instantiating VFIOUserPCIDevice objects, sharing some common code from hw/vfio/pci.c. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Introduce the vfio-user "proxy": this is the client code responsible for sending and receiving vfio-user messages across the control socket. The new files hw/vfio-user/proxy.[ch] contain some basic plumbing for managing the proxy; initialize the proxy during realization of the VFIOUserPCIDevice instance. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Add the basic implementation for receiving vfio-user messages from the control socket. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Add plumbing for sending vfio-user messages on the control socket. Add initial version negotation on connection. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Add support for getting basic device information. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Add support for getting region info for vfio-user. As vfio-user has one fd per region, enable ->use_region_fds. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Re-use PCI setup functions from hw/vfio/pci.c to realize the vfio-user PCI device. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
IRQ setup uses the same semantics as the traditional vfio path, but we need to share the corresponding file descriptors with the server as necessary. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
For vfio-user, the server holds the pending IRQ state; set up an I/O region for the MSI-X PBA so we can ask the server for this state on a PBA read. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
The user container will shortly need access to the underlying vfio-user proxy; set this up. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Hook this call up to the legacy reset handler for vfio-user-pci. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
When the vfio-user container gets mapping updates, share them with the vfio-user by sending a message; this can include the region fd, allowing the server to directly mmap() the region as needed. For performance, we only wait for the message responses when we're doing with a series of updates via the listener_commit() callback. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Unlike most other messages, this is a server->client message, for when a server wants to do "DMA"; this is slow, so normally the server has memory directly mapped instead. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
By default, the vfio-user subsystem will wait 5 seconds for a message reply from the server. Add an option to allow this to be configurable. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Support an asynchronous send of a vfio-user socket message (no wait for a reply) when the write is posted. This is only safe when no regions are mappable by the VM. Add an option to explicitly disable this as well. Signed-off-by: John Levon <john.levon@nutanix.com>
Add new message to send multiple writes to server in a single message. Prevents the outgoing queue from overflowing when a long latency operation is followed by a series of posted writes. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com>
Add some basic documentation on vfio-user usage. Signed-off-by: John Levon <john.levon@nutanix.com>
This patch introduces the vfio-user protocol specification (formerly known as VFIO-over-socket), which is designed to allow devices to be emulated outside QEMU, in a separate process. vfio-user reuses the existing VFIO defines, structs and concepts. It has been earlier discussed as an RFC in: "RFC: use VFIO over a UNIX domain socket to implement device offloading" Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com> Signed-off-by: John Levon <john.levon@nutanix.com>
jlevon
pushed a commit
to jlevon/qemu
that referenced
this pull request
Nov 17, 2025
When the Bus Master bit is disabled in a PCI device's Command Register, the device's DMA address space becomes unassigned memory (i.e. the io_mem_unassigned MemoryRegion). This can lead to deadlocks with IOThreads since io_mem_unassigned accesses attempt to acquire the Big QEMU Lock (BQL). For example, virtio-pci devices deadlock in virtio_write_config() -> virtio_pci_stop_ioeventfd() when waiting for the IOThread while holding the BQL. The IOThread is unable to acquire the BQL but the vcpu thread won't release the BQL while waiting for the IOThread. io_mem_unassigned is trivially thread-safe since it has no state, it simply rejects all load/store accesses. Therefore it is safe to enable lockless I/O on io_mem_unassigned to eliminate this deadlock. Here is the backtrace described above: Thread 9 (Thread 0x7fccfcdff6c0 (LWP 247832) "CPU 4/KVM"): #0 0x00007fcd11529d46 in ppoll () from target:/lib64/libc.so.6 #1 0x000056468a1a9bad in ppoll (__fds=<optimized out>, __nfds=<optimized out>, __timeout=0x0, __ss=0x0) at /usr/include/bits/poll2.h:88 #2 0x000056468a18f9d9 in fdmon_poll_wait (ctx=0x5646c6a1dc30, ready_list=0x7fccfcdfb310, timeout=-1) at ../util/fdmon-poll.c:79 #3 0x000056468a18f14f in aio_poll (ctx=<optimized out>, blocking=blocking@entry=true) at ../util/aio-posix.c:730 #4 0x000056468a1ad842 in aio_wait_bh_oneshot (ctx=<optimized out>, cb=cb@entry=0x564689faa420 <virtio_blk_ioeventfd_stop_vq_bh>, opaque=<optimized out>) at ../util/aio-wait.c:85 #5 0x0000564689faaa89 in virtio_blk_stop_ioeventfd (vdev=0x5646c8fd7e90) at ../hw/block/virtio-blk.c:1644 #6 0x0000564689d77880 in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5646c8fd7e08) at ../hw/virtio/virtio-bus.c:264 #7 0x0000564689d780db in virtio_bus_stop_ioeventfd (bus=bus@entry=0x5646c8fd7e08) at ../hw/virtio/virtio-bus.c:256 #8 0x0000564689d7d98a in virtio_pci_stop_ioeventfd (proxy=0x5646c8fcf8e0) at ../hw/virtio/virtio-pci.c:413 #9 virtio_write_config (pci_dev=0x5646c8fcf8e0, address=4, val=<optimized out>, len=<optimized out>) at ../hw/virtio/virtio-pci.c:803 oracle#10 0x0000564689dcb45a in memory_region_write_accessor (mr=mr@entry=0x5646c6dc2d30, addr=3145732, value=value@entry=0x7fccfcdfb528, size=size@entry=2, shift=<optimized out>, mask=mask@entry=65535, attrs=...) at ../system/memory.c:491 oracle#11 0x0000564689dcaeb0 in access_with_adjusted_size (addr=addr@entry=3145732, value=value@entry=0x7fccfcdfb528, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x564689dcb3f0 <memory_region_write_accessor>, mr=0x5646c6dc2d30, attrs=...) at ../system/memory.c:567 oracle#12 0x0000564689dcb156 in memory_region_dispatch_write (mr=mr@entry=0x5646c6dc2d30, addr=addr@entry=3145732, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...) at ../system/memory.c:1554 oracle#13 0x0000564689dd389a in flatview_write_continue_step (attrs=..., attrs@entry=..., buf=buf@entry=0x7fcd05b87028 "", mr_addr=3145732, l=l@entry=0x7fccfcdfb5f0, mr=0x5646c6dc2d30, len=2) at ../system/physmem.c:3266 oracle#14 0x0000564689dd3adb in flatview_write_continue (fv=0x7fcadc0d8930, addr=3761242116, attrs=..., ptr=0xe0300004, len=2, mr_addr=<optimized out>, l=<optimized out>, mr=<optimized out>) at ../system/physmem.c:3296 oracle#15 flatview_write (fv=0x7fcadc0d8930, addr=addr@entry=3761242116, attrs=attrs@entry=..., buf=buf@entry=0x7fcd05b87028, len=len@entry=2) at ../system/physmem.c:3327 oracle#16 0x0000564689dd7191 in address_space_write (as=0x56468b433600 <address_space_memory>, addr=3761242116, attrs=..., buf=0x7fcd05b87028, len=2) at ../system/physmem.c:3447 oracle#17 address_space_rw (as=0x56468b433600 <address_space_memory>, addr=3761242116, attrs=attrs@entry=..., buf=buf@entry=0x7fcd05b87028, len=2, is_write=<optimized out>) at ../system/physmem.c:3457 oracle#18 0x0000564689ff1ef6 in kvm_cpu_exec (cpu=cpu@entry=0x5646c6dab810) at ../accel/kvm/kvm-all.c:3248 oracle#19 0x0000564689ff32f5 in kvm_vcpu_thread_fn (arg=arg@entry=0x5646c6dab810) at ../accel/kvm/kvm-accel-ops.c:53 oracle#20 0x000056468a19225c in qemu_thread_start (args=0x5646c6db6190) at ../util/qemu-thread-posix.c:393 oracle#21 0x00007fcd114c5b68 in start_thread () from target:/lib64/libc.so.6 #22 0x00007fcd115364e4 in clone () from target:/lib64/libc.so.6 Thread 3 (Thread 0x7fcd0503a6c0 (LWP 247825) "IO iothread1"): #0 0x00007fcd114c2d30 in __lll_lock_wait () from target:/lib64/libc.so.6 #1 0x00007fcd114c8fe2 in pthread_mutex_lock@@GLIBC_2.2.5 () from target:/lib64/libc.so.6 #2 0x000056468a192538 in qemu_mutex_lock_impl (mutex=0x56468b432e60 <bql>, file=0x56468a1e26a5 "../system/physmem.c", line=3198) at ../util/qemu-thread-posix.c:94 #3 0x0000564689dc12e2 in bql_lock_impl (file=file@entry=0x56468a1e26a5 "../system/physmem.c", line=line@entry=3198) at ../system/cpus.c:566 #4 0x0000564689ddc151 in prepare_mmio_access (mr=0x56468b433800 <io_mem_unassigned>) at ../system/physmem.c:3198 #5 address_space_lduw_internal_cached_slow (cache=<optimized out>, addr=2, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at ../system/memory_ldst.c.inc:211 #6 address_space_lduw_le_cached_slow (cache=<optimized out>, addr=addr@entry=2, attrs=attrs@entry=..., result=result@entry=0x0) at ../system/memory_ldst.c.inc:253 #7 0x0000564689fd692c in address_space_lduw_le_cached (result=0x0, cache=<optimized out>, addr=2, attrs=...) at /var/tmp/qemu/include/exec/memory_ldst_cached.h.inc:35 #8 lduw_le_phys_cached (cache=<optimized out>, addr=2) at /var/tmp/qemu/include/exec/memory_ldst_phys.h.inc:66 #9 virtio_lduw_phys_cached (vdev=<optimized out>, cache=<optimized out>, pa=2) at /var/tmp/qemu/include/hw/virtio/virtio-access.h:166 oracle#10 vring_avail_idx (vq=0x5646c8fe2470) at ../hw/virtio/virtio.c:396 oracle#11 virtio_queue_split_set_notification (vq=0x5646c8fe2470, enable=0) at ../hw/virtio/virtio.c:534 oracle#12 virtio_queue_set_notification (vq=0x5646c8fe2470, enable=0) at ../hw/virtio/virtio.c:595 oracle#13 0x000056468a18e7a8 in poll_set_started (ctx=ctx@entry=0x5646c6c74e30, ready_list=ready_list@entry=0x7fcd050366a0, started=started@entry=true) at ../util/aio-posix.c:247 oracle#14 0x000056468a18f2bb in poll_set_started (ctx=0x5646c6c74e30, ready_list=0x7fcd050366a0, started=true) at ../util/aio-posix.c:226 oracle#15 try_poll_mode (ctx=0x5646c6c74e30, ready_list=0x7fcd050366a0, timeout=<synthetic pointer>) at ../util/aio-posix.c:612 oracle#16 aio_poll (ctx=0x5646c6c74e30, blocking=blocking@entry=true) at ../util/aio-posix.c:689 oracle#17 0x000056468a032c26 in iothread_run (opaque=opaque@entry=0x5646c69f3380) at ../iothread.c:63 oracle#18 0x000056468a19225c in qemu_thread_start (args=0x5646c6c75410) at ../util/qemu-thread-posix.c:393 oracle#19 0x00007fcd114c5b68 in start_thread () from target:/lib64/libc.so.6 oracle#20 0x00007fcd115364e4 in clone () from target:/lib64/libc.so.6 Buglink: https://issues.redhat.com/browse/RHEL-71933 Reported-by: Peixiu Hou <phou@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20251029185224.420261-1-stefanha@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.